Sinusoidal Approach for the Single-Channel Speech Separation and Recognition Challenge
نویسندگان
چکیده
Most of the single-channel speech separation (SCSS) systems use the short-time Fourier transform as their parametric features. Recent studies have shown that employing sinusoidal features for the SCSS application results in a high perceived speech quality. In this paper, we make a systematic study on automatic speech recognition results for a SCSS system that uses sinusoidal features composed of amplitude and frequency. We compare the speech recognition results with those already reported by other participants in the single-channel speech separation and recognition challenge. Our results show that a newly proposed system achieves an overall recognition accuracy of 52.3%, ranges at the median over all other participants in the challenge.
منابع مشابه
The 2nd ‘chime’ Speech Separation and Recognition Challenge: Approaches on Single-channel Source Separation and Model-driven Speech Enhancement
In this paper, we address the small vocabulary track (track 1) described in the CHiME 2 challenge dedicated to recognize utterances of a target speaker with small head movements. The utterances are recorded in a reverberant room acoustics corrupted with highly non-stationary noise sources. Such adverse noise scenario imposes a challenge to state-of-the-art automatic speech recognition systems. ...
متن کاملUnconstrained Speech Separation by Composition of Longest Segments
A data-driven approach is presented for improving the performance of separating single-channel mixed speech signals, assuming unknown, arbitrary temporal dynamics. The new approach seeks and separates the longest mixed speech segments which can be accurately matched by composite training segments. Lengthening the mixed speech segments to match reduces the uncertainty of the matching constituent...
متن کاملThe ICSTM+TUM+UP Approach to the 3rd CHIME Challenge: Single-Channel LSTM Speech Enhancement with Multi-Channel Correlation Shaping Dereverberation and LSTM Language Models
This paper presents our contribution to the 3rd CHiME Speech Separation and Recognition Challenge. Our system uses Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNNs) for Single-channel Speech Enhancement (SSE). Networks are trained to predict clean speech as well as noise features from noisy speech features. In addition, the system applies two methods of dereverberati...
متن کاملSeparating Speech from Speech Noise
The main work at Columbia this year has been the development of algorithms for extracting and recognizing speech in nonstationary, noisy environments when only a single microphone channel is available. Our particular approach is based on using trained models to distinguish regions of time-frequency containing speech from nonspeech areas [2], and we have pursued this along several directions: On...
متن کاملMonaural speech separation and recognition challenge
Robust speech recognition in everyday conditions requires the solution to a number of challenging problems, not least the ability to handle multiple sound sources. The specific case of speech recognition in the presence of a competing talker has been studied for several decades, resulting in a number of quite distinct algorithmic solutions whose focus ranges from modeling both target and compet...
متن کامل